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Abstract 

Scientific discovery is incremental. The Merriam-Webster definition of 'Scientific Method' is "principles and 
procedures for the systematic pursuit of knowledge involving the recognition and formulation of a problem, the 
collection of data through observation and experiment, and the formulation and testing of hypotheses". Scientists 
are taught to be excellent observers, as observations create questions, which in turn generate hypotheses. After 
centuries of science we tend to assume that we have enough observations to drive science, and enable the small 
steps and giant leaps which lead to theories and subsequent testable hypotheses. One excellent example of this is 
Charles Darwin's Voyage of the Beagle, which was essentially an opportunistic survey of biodiversity. Today, 
obtaining funding for even small-scale surveys of life on Earth is difficult; but few argue the importance of the 
theory that was generated by Darwin from his observations made during this epic journey. However, these 
observations, even combined with the parallel work of Alfred Russell Wallace at around the same time have still 
not generated an indisputable 'law of biology'. The fact that evolution remains a 'theory', at least to the general 
public, suggests that surveys for new data need to be taken to a new level. 



Letter to the editor 

One of the most comprehensive and most important 
contemporary surveys has been the recently completed 
Census of Marine Life (Census; http://www.coml.org). 
This ten-year initiative involved 2,700 scientists from 
more than 80 countries and cost in excess of US$650 mil- 
lion. The Census was driven by a fundamental hypothesis, 
that 'there exist fundamental gaps in our knowledge and 
understanding of the biology of the oceans and the subse- 
quent functioning of this system'. It has often been said 
that the absence of knowledge should be enough justifica- 
tion for exploration. The fact that the Census identified 
more than 6000 potentially new species and resulted in 
more than 2600 scientific publications validates the 
hypothesis. The Census community observed a potential 
gap in knowledge about biodiversity on our planet and 
wanted to fill it. Indeed this was driven not just by the 
scientific community, but also by public pressure for dis- 
covery. People are very responsive to new discovery, as 
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indicated by the media response to new species found in 
the Census, and the questions asked of the Census by the 
public, e.g. 'how much biodiversity is there?'; 'why is there 
so much?'; 'how did it get there?'; and how much biodiver- 
sity is enough?' [1]. What has been striking has been the 
reinforcement of our original theory - that these volumi- 
nous observations have generated innumerable new 
hypotheses. This no more true than in the microbial com- 
ponent of the census, the International Census of Marine 
Microbes (ICoMM; http://icomm.mbl.edu/), which has 
been one of the most comprehensive studies of microbial 
diversity ever accomplished. 

Were Darwin or his financiers around today, they 
would surely be deeply interested in the possibilities of 
exploring microbial life. Undoubtedly they would achieve 
this using metagenomics. In metagenomics, we isolate 
DNA directly from the environment and use it to charac- 
terize the taxonomy and function of the biological com- 
munity in that ecosystem. The power of this approach 
has been lauded by contemporary scientists such as 
Edward O. Wilson, who famously said "...if I could start 
my life over, I would work in microbial ecology" [2] . As a 
result of metagenomic analyses over the last 30 years, we 
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have theorized and hypothesized that the whole micro- 
bial community acts as a network providing a vast array 
of ecosystem services to the macro-organisms in the eco- 
system. Only now and only with metagenomics, do we 
have the potential to produce a critical mass of data that 
will enable testing of these hypotheses. 

Darwin started making observations about organisms 
before using the gathered data to generate his theory on 
the origin of species via natural selection. Evolution as a 
theory and subsequently as a testable hypothesis was gen- 
erated from his open-minded observations. Likewise large- 
scale observational studies have aided the development of 
microbial ecology; for example, the Global Ocean Survey 
(GOS), a marine metagenomic transect survey of the 
world's oceans [3], has without a doubt, been one of the 
most influential microbial ecology studies ever. Many stu- 
dies have made use of the GOS dataset to generate 
hypotheses and make conclusions about the biogeographic 
properties and functions of ocean microbial communities. 
Despite this clear impact, the GOS dataset is often vilified 
for poor experimental design and the absence of appropri- 
ate metadata necessary for analysis of the influence of 
environment on microbial diversity along the transect. We 
would agree that the data are B.A.D. - but only insofar as 
they are the Best Available Data - as with Darwin's imper- 
fect survey of animals and plants, it has significantly con- 
tributed to our understanding of, in this case, marine 
microbiology. GOS inspired a range of metagenomic 
research efforts, and the resulting explosion of metage- 
nomic discovery voyages from the human digestive tract 
http:/ /www.metahit.eu; http:/ /nihroadmap. nih.gov/hmp/ 
to the soil http://www.terragenome.org. Some new studies 
have undoubtedly used the observations made by the GOS 
to derive hypotheses; for example, TARA Oceans http:// 
oceans.taraexpeditions.org/ uses a similar experimental 
approach to GOS, but with statistical design and contex- 
tual metadata. 

The research community needs to carefully balance 
groundbreaking observational studies, however imperfect, 
with carefully designed experimental approaches. For 
example, the Earth Microbiome Project (EMP) http:// 
www.earthmicrobiome.org is using metagenomics to sur- 
vey the largest distribution of samples ever attempted 
[4,5] . While driven by specific hypotheses, which will be 
tested by the data, this study is also fundamentally a 
voyage of discovery. The 200,000 planned environmental 
samples sequenced for taxonomic and functional analysis, 
will undoubtedly generate hypotheses that are currently 
inconceivable. As with all data discovery, the way in which 
the data are analyzed and presented to the community will 
impact how they are used. Hence the EMP will re-assem- 
ble microbial genomes to discover new physiology, pro- 
duce metabolic maps to discover functional mechanisms 
and explore taxonomic and protein space. The EMP is 



predicated on the value of voyages of discovery; and what 
a voyage we have before us! 

The vast imbalance between what it is possible to 
hypothesize and test, and what is unknown means that 
every microbial ecologist is on an epic voyage of equal 
importance to that of Darwin. There are many funda- 
mental theories about microbial life that still need to be 
examined, and many of these can only be explored by 
intelligent sampling in an unrestricted environmental 
surveys. Restricted analysis such as laboratory based 
manipulation, culturing, PCR amplification and genome 
experimentation are very important to the understanding 
of microbial adaptation. However, lab experiments are 
artificial and hence it will always be necessary to contex- 
tualize these results with environmental observation, and 
DNA extraction bias notwithstanding, metagenomics is 
the most unrestricted and comprehensive approach. Our 
ability to interpret these data is always improving [6,7] 
and we stand on a precipice of unprecedented discovery, 
such as whether the global ocean contains a homoge- 
neous pool of microbial genes acquired by billions of 
years of exchange and dispersal [1]. Microbes are not the 
only group to benefit from these surveys; viruses exist at 
10 times the abundance of microbes in virtually all eco- 
systems, and the only effective technique to examine the 
full breadth of their populations is unrestricted metage- 
nomic survey, as no universal gene exists to allow ampli- 
con surveys [8], As viruses are the drivers of gene 
exchange - their characterization is equally necessary to 
answer many of the relevant questions. 

Thus, funding agencies and private foundations should 
not reject discovery studies that aim to explore the vast 
frontiers of microbial life in relatively unstructured ways. 
This 'dark matter' must be explored, albeit intelligently. 
This is not a call for blind, blanket surveys, but for 
exploring microbial communities at a supra-ecosystem 
level at a time when we are realizing that we know very 
little about the microbial world, yet understand increas- 
ingly that microbes drive ecological processes at all 
scales. 

Microbial ecology is rapidly evolving as a science. We 
need more and better surveys of every ecosystem and 
better standardization of ecosystem variables that can be 
used to relate biology to environment. The problem is 
vast, with more microbial life in the oceans than stars in 
the known universe [9], yet it is not insurmountable. 
There is increasing evidence, for example, for the "every- 
thing is everywhere, but the environment selects" theory 
being closer to the truth than anyone had previously con- 
ceived [1,10]. This is a very exciting time in global biodi- 
versity discovery, and we must not forget that 
observations play a major role in science; only with effec- 
tive observation can we develop testable hypotheses. It is 
already clear that microbial evolution, which is most of 
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evolution, used some tricks that seem to be out of fashion 
among the larger organisms. So, who knows, perhaps stu- 
dies like GOS, TARA and the EMP will yield the next 
theory of evolution, and some young Census scientist the 
next Darwin? 
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